My daughter is convinced she can tell the difference between M&M colors by their flavor. I’m not buying it. Since I’m a statistician, we devise the following two experiments in an attempt to investigate her claim.
Experiment 2
In this version, I randomly draw 18 M&Ms from the bag, without regard to color, and once again provide them to my blindfolded daughter. She makes her best guess at the color of each M&M and at the conclusion we count the number of correctly determined M&Ms out of 18.
Given these experiments, we want to answer the following questions:
When using \(\alpha = 0.05\) level of significance, what is the critical value for testing the null hypothesis that she is merely guessing (equal likelihood of selecting any given color)? What is the size of the test?
Suppose my daughter can correctly distinguish blue M&Ms from the others, but if the M&M is not blue, she’s merely guessing. What is the statistical power of the testing procedure for this situation? Given the answer, which experiment would be preferred by a statistician (assuming both are equally convenient)?
Suppose my daughter correctly identified 7 M&Ms. Is this sufficient evidence against the null hypothesis that she is merely guessing?
rr exp_1 <- function(h0 = TRUE) { max <- ifelse(h0, 6, 5) truth <- sample(1:max) guess <- sample(1:max) # How many were correctly guessed? sum(guess == truth) + !h0 }
rr sim_1 <- function(h0 = TRUE, n_rounds = 3, n_reps = 10000) { replicate(n_reps, sum(replicate(n_rounds, exp_1(h0)))) }
rr sim_2 <- function(h0 = TRUE, n_samples = 18, n_reps = 10000) { if (h0) { # Random guess (1/6 likelihood of success) rbinom(n_reps, n_samples, p = 1/6) } else { # How many blues are included - these are always correctly determined blues <- rbinom(n_reps, n_samples, p = 1/6)
# The ramaining m&ms after blues are accounted for are randomly guessed
others <- rbinom(n_reps, n_samples - blues, p = 1/5)
blues + others
} }
rr h0_dist_1 <- sim_1() ha_dist_1 <- sim_1(FALSE)
rr h0_dist_2 <- sim_2() ha_dist_2 <- sim_2(FALSE)
Now that we have these distributions, we can answer the questions.
rr alpha <- 0.05 # Critical value (crit_val_1 <- quantile(h0_dist_1, probs = 1 - alpha, type = 1))
95%
6
rr # Size (probability of Type I error (false rejection of null hypothesis)) mean(h0_dist_1 >= crit_val_1)
[1] 0.0826
rr # Critical value (crit_val_2 <- quantile(h0_dist_2, probs = 1 - alpha, type = 1))
95%
6
rr # Size (probability of Type I error (false rejection of null hypothesis)) mean(h0_dist_2 > crit_val_2)
[1] 0.0203
Statistical power is “the probability the test correctly rejects \(H_0\) for a given \(H_A\).”
rr prop_ci <- function(x) { est <- mean(x) ci <- qnorm(0.975) * sqrt(est * (1 - est) / length(x)) c( lower = est - ci, estimate = est, upper = est + ci ) }
rr (power_1 <- prop_ci(ha_dist_1 > crit_val_1))
lower estimate upper
0.339459 0.348800 0.358141

rr
power_2 <- prop_ci(ha_dist_2 >= crit_val_2)
power_2
lower estimate upper
0.3804403 0.3900000 0.3995597
rr
# Number of M&Ms correctly identified
observed_stat <- 7
p_val_1 <- prop_ci(h0_dist_1 >= observed_stat)
p_val_1
lower estimate upper
0.02731899 0.03070000 0.03408101
rr
p_val_2 <- prop_ci(h0_dist_2 >= observed_stat)
p_val_2
lower estimate upper
0.02015572 0.02310000 0.02604428
LS0tCnRpdGxlOiAiTSZNIFN0dWR5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZSA9IEZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoY2FjaGUgPSBUUlVFKQpgYGAKCk15IGRhdWdodGVyIGlzIGNvbnZpbmNlZCBzaGUgY2FuIHRlbGwgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiBNJk0gY29sb3JzIGJ5IHRoZWlyCmZsYXZvci4gSSdtIG5vdCBidXlpbmcgaXQuIFNpbmNlIEknbSBhIHN0YXRpc3RpY2lhbiwgd2UgZGV2aXNlIHRoZSBmb2xsb3dpbmcKdHdvIGV4cGVyaW1lbnRzIGluIGFuIGF0dGVtcHQgdG8gaW52ZXN0aWdhdGUgaGVyIGNsYWltLgoKIyMjIEV4cGVyaW1lbnQgMQpUaGVyZSBhcmUgNiBNJk0gY29sb3JzLiBJbiB0aGlzIGV4cGVyaW1lbnQsIEkgc2VsZWN0IG9uZSBNJk0gb2YgZWFjaCBjb2xvciBhbmQgCmdpdmUgaXQgdG8gbXkgZGF1Z2h0ZXIgd2hpbGUgc2hlIGlzIGJsaW5kZm9sZGVkLiBTaGUgZ3Vlc3NlcyB0aGUgY29sb3Igb2YgZWFjaApNJk0gYXMgc2hlIHRhc3RlcyBpdC4gV2UgYWdyZWUgdGhhdCBhZnRlciBzaGUgZ3Vlc3NlcyBhIGNvbG9yLCBzaGUgd2lsbCBub3QgZ3Vlc3MKaXQgYWdhaW4gaW4gdGhhdCByb3VuZC4gV2UgcGVyZm9ybSB0aGlzIGV4ZXJjaXNlIGZvciB0aHJlZSByb3VuZHMsIHNvIGEgdG90YWwgb2YgCjE4IE0mTXMgYXJlIGNvbnN1bWVkLiBBdCB0aGUgZW5kLCB3ZSBjb3VudCB0aGUgbnVtYmVyIG9mIGNvbG9ycyBzaGUgY29ycmVjdGx5IApkZXRlcm1pbmVkIG91dCBvZiAxOC4KCiMjIyBFeHBlcmltZW50IDIKSW4gdGhpcyB2ZXJzaW9uLCBJIHJhbmRvbWx5IGRyYXcgMTggTSZNcyBmcm9tIHRoZSBiYWcsIHdpdGhvdXQgcmVnYXJkIHRvIGNvbG9yLCAKYW5kIG9uY2UgYWdhaW4gcHJvdmlkZSB0aGVtIHRvIG15IGJsaW5kZm9sZGVkIGRhdWdodGVyLiBTaGUgbWFrZXMgaGVyIGJlc3QgZ3Vlc3MKYXQgdGhlIGNvbG9yIG9mIGVhY2ggTSZNIGFuZCBhdCB0aGUgY29uY2x1c2lvbiB3ZSBjb3VudCB0aGUgbnVtYmVyIG9mIGNvcnJlY3RseQpkZXRlcm1pbmVkIE0mTXMgb3V0IG9mIDE4LgoKR2l2ZW4gdGhlc2UgZXhwZXJpbWVudHMsIHdlIHdhbnQgdG8gYW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zOgoKMS4gV2hlbiB1c2luZyAkXGFscGhhID0gMC4wNSQgbGV2ZWwgb2Ygc2lnbmlmaWNhbmNlLCB3aGF0IGlzIHRoZSBjcml0aWNhbCB2YWx1ZSAKZm9yIHRlc3RpbmcgdGhlIG51bGwgaHlwb3RoZXNpcyB0aGF0IHNoZSBpcyBtZXJlbHkgZ3Vlc3NpbmcgKGVxdWFsIGxpa2VsaWhvb2Qgb2YKc2VsZWN0aW5nIGFueSBnaXZlbiBjb2xvcik/IFdoYXQgaXMgdGhlIHNpemUgb2YgdGhlIHRlc3Q/CgoyLiBTdXBwb3NlIG15IGRhdWdodGVyIGNhbiBjb3JyZWN0bHkgZGlzdGluZ3Vpc2ggYmx1ZSBNJk1zIGZyb20gdGhlIG90aGVycywgYnV0CmlmIHRoZSBNJk0gaXMgbm90IGJsdWUsIHNoZSdzIG1lcmVseSBndWVzc2luZy4gV2hhdCBpcyB0aGUgc3RhdGlzdGljYWwgcG93ZXIgb2YgCnRoZSB0ZXN0aW5nIHByb2NlZHVyZSBmb3IgdGhpcyBzaXR1YXRpb24/IEdpdmVuIHRoZSBhbnN3ZXIsIHdoaWNoIGV4cGVyaW1lbnQKd291bGQgYmUgcHJlZmVycmVkIGJ5IGEgc3RhdGlzdGljaWFuIChhc3N1bWluZyBib3RoIGFyZSBlcXVhbGx5IGNvbnZlbmllbnQpPwoKMy4gU3VwcG9zZSBteSBkYXVnaHRlciBjb3JyZWN0bHkgaWRlbnRpZmllZCA3IE0mTXMuIElzIHRoaXMgc3VmZmljaWVudCBldmlkZW5jZSAKYWdhaW5zdCB0aGUgbnVsbCBoeXBvdGhlc2lzIHRoYXQgc2hlIGlzIG1lcmVseSBndWVzc2luZz8KCmBgYHtyfQpleHBfMSA8LSBmdW5jdGlvbihoMCA9IFRSVUUpIHsKICBtYXggPC0gaWZlbHNlKGgwLCA2LCA1KQogIHRydXRoIDwtIHNhbXBsZSgxOm1heCkKICBndWVzcyA8LSBzYW1wbGUoMTptYXgpCiAgIyBIb3cgbWFueSB3ZXJlIGNvcnJlY3RseSBndWVzc2VkPwogIHN1bShndWVzcyA9PSB0cnV0aCkgKyAhaDAKfQpgYGAKCmBgYHtyfQpzaW1fMSA8LSBmdW5jdGlvbihoMCA9IFRSVUUsIG5fcm91bmRzID0gMywgbl9yZXBzID0gMTAwMDApIHsKICByZXBsaWNhdGUobl9yZXBzLCBzdW0ocmVwbGljYXRlKG5fcm91bmRzLCBleHBfMShoMCkpKSkKfQpgYGAKCmBgYHtyfQpzaW1fMiA8LSBmdW5jdGlvbihoMCA9IFRSVUUsIG5fc2FtcGxlcyA9IDE4LCBuX3JlcHMgPSAxMDAwMCkgewogIGlmIChoMCkgewogICAgIyBSYW5kb20gZ3Vlc3MgKDEvNiBsaWtlbGlob29kIG9mIHN1Y2Nlc3MpCiAgICByYmlub20obl9yZXBzLCBuX3NhbXBsZXMsIHAgPSAxLzYpCiAgfSBlbHNlIHsKICAgICMgSG93IG1hbnkgYmx1ZXMgYXJlIGluY2x1ZGVkIC0gdGhlc2UgYXJlIGFsd2F5cyBjb3JyZWN0bHkgZGV0ZXJtaW5lZAogICAgYmx1ZXMgPC0gcmJpbm9tKG5fcmVwcywgbl9zYW1wbGVzLCBwID0gMS82KQogICAgCiAgICAjIFRoZSByYW1haW5pbmcgbSZtcyBhZnRlciBibHVlcyBhcmUgYWNjb3VudGVkIGZvciBhcmUgcmFuZG9tbHkgZ3Vlc3NlZAogICAgb3RoZXJzIDwtIHJiaW5vbShuX3JlcHMsIG5fc2FtcGxlcyAtIGJsdWVzLCBwID0gMS81KQogICAgYmx1ZXMgKyBvdGhlcnMKICB9Cn0KYGBgCgpgYGB7cn0KaDBfZGlzdF8xIDwtIHNpbV8xKCkKaGFfZGlzdF8xIDwtIHNpbV8xKEZBTFNFKQpgYGAKCmBgYHtyfQpoMF9kaXN0XzIgPC0gc2ltXzIoKQpoYV9kaXN0XzIgPC0gc2ltXzIoRkFMU0UpCmBgYAoKTm93IHRoYXQgd2UgaGF2ZSB0aGVzZSBkaXN0cmlidXRpb25zLCB3ZSBjYW4gYW5zd2VyIHRoZSBxdWVzdGlvbnMuCgpgYGB7cn0KYWxwaGEgPC0gMC4wNQojIENyaXRpY2FsIHZhbHVlCihjcml0X3ZhbF8xIDwtIHF1YW50aWxlKGgwX2Rpc3RfMSwgcHJvYnMgPSAxIC0gYWxwaGEsIHR5cGUgPSAxKSkKIyBTaXplIChwcm9iYWJpbGl0eSBvZiBUeXBlIEkgZXJyb3IgKGZhbHNlIHJlamVjdGlvbiBvZiBudWxsIGh5cG90aGVzaXMpKQptZWFuKGgwX2Rpc3RfMSA+PSBjcml0X3ZhbF8xKQpgYGAKCmBgYHtyfQojIENyaXRpY2FsIHZhbHVlCihjcml0X3ZhbF8yIDwtIHF1YW50aWxlKGgwX2Rpc3RfMiwgcHJvYnMgPSAxIC0gYWxwaGEsIHR5cGUgPSAxKSkKIyBTaXplIChwcm9iYWJpbGl0eSBvZiBUeXBlIEkgZXJyb3IgKGZhbHNlIHJlamVjdGlvbiBvZiBudWxsIGh5cG90aGVzaXMpKQptZWFuKGgwX2Rpc3RfMiA+PSBjcml0X3ZhbF8yKQpgYGAKClN0YXRpc3RpY2FsIHBvd2VyIGlzICJ0aGUgcHJvYmFiaWxpdHkgdGhlIHRlc3QgY29ycmVjdGx5IHJlamVjdHMgJEhfMCQgZm9yIGEgZ2l2ZW4KJEhfQSQuIgoKYGBge3J9CnByb3BfY2kgPC0gZnVuY3Rpb24oeCkgewogIGVzdCA8LSBtZWFuKHgpCiAgY2kgPC0gcW5vcm0oMC45NzUpICogc3FydChlc3QgKiAoMSAtIGVzdCkgLyBsZW5ndGgoeCkpCiAgYygKICAgIGxvd2VyID0gZXN0IC0gY2ksCiAgICBlc3RpbWF0ZSA9IGVzdCwKICAgIHVwcGVyID0gZXN0ICsgY2kKICApCn0KYGBgCgoKYGBge3J9Cihwb3dlcl8xIDwtIHByb3BfY2koaGFfZGlzdF8xID49IGNyaXRfdmFsXzEpKQpgYGAKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQp0aWJibGUobnVsbCA9IGgwX2Rpc3RfMSwKICAgICAgIGFsdCA9IGhhX2Rpc3RfMSkgJT4lIAogIGdhdGhlcigpICU+JSAKICBjb3VudChrZXksIHZhbHVlKSAlPiUgCiAgY29tcGxldGUoa2V5LCB2YWx1ZSkgJT4lIAogIGdncGxvdChhZXMoeCA9IHZhbHVlLCB5ID0gbiwgZmlsbCA9IGtleSkpICsKICBnZW9tX2NvbChwb3NpdGlvbiA9ICJkb2RnZSIpICsKICBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSBjcml0X3ZhbF8xLCBjb2wgPSAicmVkIiwgbHdkID0gMikgKwogIHNjYWxlX3lfY29udGludW91cyhsYWJlbHMgPSBzY2FsZXM6OmNvbW1hKSArCiAgdGhlbWVfYncoKSArCiAgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gImJvdHRvbSIpICsKICBsYWJzKHkgPSAiQ291bnQiLAogICAgICAgeCA9ICJDb3JyZWN0IE0mTXMiLAogICAgICAgZmlsbCA9ICJIeXBvdGhlc2lzIikKYGBgCgoKYGBge3J9CnBvd2VyXzIgPC0gcHJvcF9jaShoYV9kaXN0XzIgPj0gY3JpdF92YWxfMikKcG93ZXJfMgpgYGAKCmBgYHtyfQojIE51bWJlciBvZiBNJk1zIGNvcnJlY3RseSBpZGVudGlmaWVkCm9ic2VydmVkX3N0YXQgPC0gNwoKcF92YWxfMSA8LSBwcm9wX2NpKGgwX2Rpc3RfMSA+PSBvYnNlcnZlZF9zdGF0KQpwX3ZhbF8xCmBgYAoKYGBge3J9CnBfdmFsXzIgPC0gcHJvcF9jaShoMF9kaXN0XzIgPj0gb2JzZXJ2ZWRfc3RhdCkKcF92YWxfMgpgYGAK